Grammar Index by Induced Suffix Sorting

نویسندگان

چکیده

We propose a new compressed text index built upon grammar compression based on induced suffix sorting [Nunes et al., DCC’18]. show that this exhibits locality sensitive parsing property, which allows us to specify, given pattern P, certain substrings of called cores, are similarly parsed in the whenever these occurrences extensible P. Supported by length m, we can locate all its \(\text {occ}\) T n within \(\mathop {}\mathopen {}{\mathcal {O}}\mathopen {}(m \lg |{\mathcal {S}}| + \text {occ}_C\lg {occ})\) time, where \({\mathcal {S}}\) is set characters and non-terminals, number occurrences, {occ}_C\) chosen core C P right hand side production rules T. Our requires {}(g)\) words space be {}(n)\) time using working space, g sum lengths sides rules. practically evaluate our proposed excels at locating long patterns highly-repetitive texts. implementation available https://github.com/TooruAkagi/GCIS_Index.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Grammar Compression Algorithm based on Induced Suffix Sorting

We introduce GCIS, a grammar compression algorithm based on the induced suffix sorting algorithm SAIS, presented by Nong et al. in 2009. Our solution builds on the factorization performed by SAIS during suffix sorting. We construct a context-free grammar on the input string which can be further reduced into a shorter string by substituting each substring by its corresponding factor. The resulti...

متن کامل

In-Place Suffix Sorting

Given string T = T [1, . . . , n], the suffix sorting problem is to lexicographically sort the suffixes T [i, . . . , n] for all i. This problem is central to the construction of suffix arrays and trees with many applications in string processing, computational biology and compression. A bottleneck in these applications is the amount of workspace needed to perform suffix sorting beyond the spac...

متن کامل

Faster suffix sorting

We propose a fast and memory efficient algorithm for lexicographically sorting the suffixes of a string, a problem that has important applications in data compression as well as string matching. Our algorithm eliminates much of the overhead of previous specialized approaches while maintaining their robustness for all kinds of input. For input size n, our algorithm operates in only two integer a...

متن کامل

Notes on Suffix Sorting

We study the problem of lexicographically sorting the suffixes of a string of symbols. In particular, we analyze the time complexity of Sadakane’s suffix sorting algorithm [8], showing that this is O(n log n) in the worst case. We also give a small improvement in the space requirements of this algorithm. We conclude that Sadakane’s algorithm, which has previously been shown to outperform the cl...

متن کامل

Parallel Suffix Sorting

We present a parallel algorithm for lexicographically sorting the suffixes of a string. Suffix sorting has applications in string processing, data compression and computational biology. The ordered list of suffixes of a string stored in an array is known as Suffix Array, an important data structure in string processing and computational biology. Our focus is on deriving a practical implementati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-86692-1_8